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Abstract. A particular challenge in the area of social media analysis is 
how to find communities within a larger network of social interactions. 
Here a community may be a group of microblogging users who post con- 
tent on a coherent topic, or who are associated with a specific event or 
news story. Twitter provides the ability to curate users into lists, corre- 
sponding to meaningful topics or themes. Here we describe an approach 
for crowdsourcing the list building efforts of many different Twitter users, 
in order to identify topical communities. This approach involves the use 
of ensemble community finding to produce stable groupings of user lists, 
and by extension, individual Twitter users. We examine this approach 
in the context of a case study surrounding the detection of communities 
on Twitter relating to the London 2012 Olympics. 

1 Introduction 

A wide variety of community finding techniques have been proposed in the liter- 
ature, with recent research focusing on the challenge of identifying overlapping 
communities [1]. In the case of microblogging data, researchers have been in- 
terested in the identification of communities of users on Twitter, who produce 
tweets on a common topic, who belong to the same demographic, or who share 
a common ideological viewpoint [2]. These approaches have generally relied on 
explicit views of the Twitter network, such as follower relations or retweets. 

Twitter users can organise the accounts that they follow into Twitter user 
lists, as shown in Fig. 1. These lists are used in a variety of ways. In some cases 
they may correspond to personal lists of a given user's friends and families, but 
frequently lists are employed to group together Twitter accounts based on a 
common topic or theme. In this way, every Twitter user can effectively become 
a community curator. Notably, journalists from news organisations such as The 
Telegraph and Storyful curate lists relevant to a given news story or event, 
as a means of monitoring breaking news. Recently, Kim et al. and Garci'a-Silva 
et al. [3, 4] both discussed the potential of user lists to provide latent annotations 
for Twitter user profiles, while Wu et al. [5] suggested user lists as a means of 
harnessing the "wisdom of the crowds" on Twitter. 

Our primary goal here is to demonstrate that topical communities can be 
identified by harnessing the "crowd-sourced" list building efforts of a large base 
of Twitter users. In Section 3, we show that this can be done by constructing 
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Fig. 1. User list, curated by The Telegraph, covering athletes, journalists, and organi- 
sations involved in the London 2012 Summer Olympics. 



a graph based on the similarity of user list memberships, and then using an 
ensemble community finding approach to find robust, overlapping groups of lists 
within this graph, from which user communities can be derived. We use stabil- 
ity information derived from the ensemble as a proxy for the reliability of the 
communities. In Section 4, we evaluate the proposed techniques on a case study 
relating to coverage of the London 2012 Olympics on Twitter. 



2 Related Work 

2.1 Microblogging Analysis 

Many researchers have become interested in exploring the network structure 
within the Twitter network, given the potential for Twitter to facilitate the 
rapid spread of information. Java et al. [2] provided an initial analysis of the 
early growth of the network, and also performed a small-scale evaluation that 
indicated the presence of distinct Twitter user communities, where the mem- 
bers share common interests as reflected by the terms appearing in their tweets. 
Kwak et al. [6] performed an evaluation based on a sample of 41.7 million users 
and 106 million tweets from a network mining perspective. The authors studied 
aspects such as: identifying influential users, information diffusion, and trending 
topics. Shamma et al. [7] performed an analysis on microblogging activity dur- 
ing the 2008 US presidential debates. The authors demonstrated that frequent 
terms reflected the topics being discussed, but the use of informal vocabulary 
complicated topic identification. 

Typically researchers have focused either on Twitter users from the perspec- 
tive of the content that they produce, or in terms of explicit network representa- 
tions based on follower relations or retweeting activity [6]. However, preliminary 



work by Kim et al. [3] suggested that latent groups and relations in Twitter 
data could be extracted by examining user list data. Wu et al. [5] suggested 
that user list memberships could be used to organise users into a pre-defined 
set of categories: celebrities, media, organisations, and blogs. Having classified 
the users, important or "elite" users within each category were identified based 
on the number of lists to which each user was assigned. Garcfa-Silva et al. [4] 
described approaches for extracting semantic relations from user lists, by con- 
structing relations between co-occurring keywords taken from list names. In the 
context of list curation, Greene et al. [8] showed that co-listed links could be used 
as one potential way to recommend users who may belong to the "community" 
surrounding a breaking news story. 

2.2 Community Finding 

Many different algorithms have been proposed to identify communities in graphs, 
based on different combinations of objective functions and search strategies [1]. 
Recently, considerable focus has been given to identifying communities in highly- 
overlapping, weighted networks. A widely-employed algorithm in this area is 
OSLOM (Order Statistics Local Optimization Method), introduced by Lan- 
cichinetti et al. [9]. Kwak et al. [10] observed that many community detection 
algorithms can produce inconsistent results, due to stochastic elements in their 
optimisation process. Lancichinetti & Fortunato [11] demonstrated that this also 
applied to OSLOM, and proposed an ensemble approach to generate stable re- 
sults out of a set of multiple partitions. 

In the more general cluster analysis literature, ensemble clustering methods 
have been previously developed to address such issues. These methods typically 
involve generating a diverse set of "base clusterings" , which are then aggregated 
to produce a consensus solution [12-14]. The most popular aggregation strategy 
has been to use information derived from different clusterings to determine the 
level of association between each pair of items in a dataset [12, 13]. This strategy 
was motivated by the observation that pairwise co-assignments, averaged over 
a sufficiently large number of clusterings, may be used to induce a new, more 
robust measure of similarity on the data. 

3 Methods 

In this section, we introduce an approach that aggregates user list information 
to generate communities. Firstly, we describe the construction of a graph repre- 
sentation of user lists, based on their membership overlaps. Then in Section 3.2 
we describe an ensemble approach to identify overlapping groups of user lists. 
The stability of these groups is assessed as described in Section 3.3, and the 
selection of community labels is discussed in Section 3.4. Finally, the derivation 
of corresponding communities for individual users is discussed in Section 3.5. 



3.1 User List Graph Construction 

We construct a graph G of I nodes, where each node represents a distinct Twitter 
user list L x . A weighted edge exists between a pair of lists if they share users in 
common. Rather than using the raw intersection size between a pair, we make 
allowance for the significance of the intersection size relative to the size of the 
two lists, and the total number of users assigned to lists n. For a pair (L x ,L y ), 
we compute a p- value to indicate the significance of the probability of observing 
at least \L X n L y \ users from L x within another list of size \L y \: 

\L x nL y \-l /\L x \\/n-\L x \\ 

PV(L x ,L y ) = l- £ j (1) 

3=0 y\L v \) 

To improve interpretability, we compute the associated log p-value: 

LPV{L xl L y ) = -log (PV(L X , L y )) (2) 

where a larger value is more significant. We consider Eqn. 2 as a measure of the 
similarity between a pair of user lists, corrected for chance. To further increase 
the sparseness of the graph, we remove edges with weights LPV < p for a weight 
threshold p. Increasing the value of p will result in an increasingly sparse graph. 

3.2 Combining Overlapping Communities 

We will naturally expect that different topical communities will potentially over- 
lap with one another. To identify communities of lists, we apply the OSLOM 
algorithm which has been shown to out-perform other community finding ap- 
proaches [9]. However, as noted in [11], OSLOM can produce unstable results. 

Following the CSPA ensemble aggregation approach [13], and the method for 
combining network partitions [11], we now describe an approach for generating 
and combining an ensemble of overlapping community sets. Given an initial 
user list graph G, we construct a symmetric I x I consensus matrix M. For the 
purpose of generating a collection of r base community sets, we apply the OSLOM 
algorithm [9] using a different initial random seed for each run. Motivated by 
the notion an ensemble of weak clusterings [14], we use the "fast" configuration 
of OSLOM, which uses a minimal number of optimisation iterations. 

After generating a base community set, for each unique pair of nodes (L x , L y ) 
in network, we compute the Jaccard similarity between the sets of community 
labels assigned to those nodes by OSLOM. If the pair are not both co-assigned to 
any community, the score is 0. If the pair are present in all communities together, 
the score is 1. However, unlike the binary approach of [11], if the pair are present 
in some but not all communities together, the Jaccard score will reflect this. See 
Fig. 2 for examples. In the case of non-overlapping partitions, the score will 
reduce to the binary scoring used in [11]. After computing all Jaccard scores, 
we increment the corresponding matrix entries in M. Note that, by definition, 
singleton communities are ignored during this aggregation process. 
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Fig. 2. Four different cases of computing the Jaccard similarity between the sets of 
community labels assigned to two nodes L x and L y . The Jaccard scores respectively 
are: (a) 0.0, (b) 1.0, (c) 1.0, (d) 0.5. 



Once all r base community sets have been generated, M is normalised by 
1/r to give a matrix with entries G [0, 1]. To find the consensus communities, we 
follow a similar approach to that used by [11]. We construct a new undirected 
weighted graph such that, for every unique pair of nodes {L X1 L y ), we create an 
edge with weight M xy if M xy > r. The threshold parameter r e [0, 1] controls the 
sparsity of the graph. We then apply OSLOM to this graph for a large number 
of iterations (w 50) to produce a final grouping of the user lists in G. 

3.3 Evaluating Community Stability 

When applying community detection, often we may wish to examine the most 
reliable or robust communities with the strongest signals in the network. Here, 
we rank the consensus communities generated as described in Section 3.2, based 
on the cohesion of their members with respect to the consensus matrix M. A 
more stable consensus community will consist of user lists which were frequently 
co-assigned to one or more communities across all r base community sets. 

For a given consensus community C of size c, we compute the mean of the 
values M xy for all unique pairs (L x , L y ) assigned to C; this value has the range 
€ [0, 1]. We then compute the mean expected value for a community of size c as 
follows: randomly select c unique nodes from G, and compute their mean pairwise 
score from the corresponding entries in M. This process is repeated over a large 
number of randomised runs, yielding an approximation of the expected stability 
value. We then employ the widely-used adjustment technique introduced by [15] 
to correct stability for chance agreement: 

CorrectcdStabilitvfC) - Stability(C) - ExpectedStability(C) 
Corrcctedbtabihty(G ) 1 _ ExpectedStability (C) (3) 

A value close to 1 indicates a highly-stable community, while a value closer to 
is indicative of a weak community that appeared intermittently over the r runs. 
We rank all consensus communities based on their values for Eqn. 3. 



3.4 Selecting Community Labels 

To summarise the content of a consensus community, we aggregate the meta- 
information associated with all lists assigned to that community. Specifically, we 
construct a bag-of-words model, where each user list is represented by unigrams 
and bigrams tokenised from the list's name and description. Single stop- words are 
removed, and terms are weighted using log-based TF-IDF. For each community, 
we then compute the centroid vector corresponding to the mean vector of all lists 
assigned to that community. To generate descriptive labels for the community, we 
subtract the mean vector of all user list vectors from the community centroid 
vector, and rank terms in descending order based on the resulting weights. The 
top ranked terms are used as community labels. 

3.5 Deriving User Memberships 

The consensus communities generated using the method proposed in Section 3.2 
can potentially provide us with an insight into the overall topics in a Twitter 
corpus. However, it will often be useful to assign community memberships to 
individual users. We can readily produce this by using the list groupings in 
conjunction with the original user list membership information. 

For a given consensus community C of size c, we examine the memberships of 
all lists in the community. We consider the assignment of a user m to each L x e C 
as being a vote with weight 1/c for Ui belonging to the overall community. The 
total membership weight for Ui is therefore given by the fraction of lists L x G C 
containing Ui. Membership weights for all communities are computed in this way. 
We can also rank the importance of users in a given community by sorting users 
by weight in descending order. To produce a final set of user communities, we 
only include a user in a community for which the user has a membership weight 
> fi, based on a membership threshold \i <E [0, 1]. 

4 Evaluation 
4.1 Data Collection 

To evaluate the proposed community finding methods, we constructed a dataset 
based on a list of 499 users curated by The Telegraph, which covers athletes, 
journalists, and organisations involved in the London 2012 Summer Olympics 1 . 
Initially, for each user we retrieved up to their 200 most recent user list assign- 
ments. From this initial pool of lists, we then retrieved list memberships for 
10,000 randomly selected lists of size > 5 and containing at least 2 core list 
users. This yielded a dataset containing a total of 44,484 individual list mem- 
bership records, where the average number of lists per user was 89. The most 
frequently- listed user was @andy_murray, assigned to 1,931 different lists. 



1 http : //twitter . com/# ! /Telegraph2012/london2012 



4.2 Community Detection 



Using the approach described in Section 3.1, we constructed a user list graph 
based on membership information for the 499 users. To limit the density of the 
graph, we use a weight significance threshold of p = 6 (i.e. user list overlaps 
are considered as significant for LP < 1~ 6 ). This resulted in a graph containing 
4,948 nodes representing user lists, with 749,062 weighted edges between them. 

To generate an ensemble of base community sets, we apply OSLOM as de- 
scribed in Section 3.2 for 100 random runs, selecting the lowest level of the 
hierarchy as the solution for each run. The average number of non-singleton 
communities in each run was 157. Combining the base community sets yielded 
a consensus matrix containing w 11.8m non-zero values. We examined a range 
of threshold values t € [0.1,0.5], and selected a threshold r = 0.2 to generate 
consensus communities in order to maximise coverage over user lists, while also 
reducing the density of the consensus graph. Applying OSLOM to the sparse 
graph of w 1.5m values produced a total of 94 consensus communities, consid- 
erably lower than the average base community count. Finally, user communities 
were derived using a low membership threshold /i = 0.1 to maximise the number 
of core users assigned to communities. In total, 416 core users were assigned to 
at least one community, representing 83% of the overall total. Of these, 362 users 
were assigned to multiple communities. 

Table 1 shows the top 15 communities, arranged in descending order by their 
stability score, as defined in Eqn. 3. The table shows the size of each community 
(in terms of both number of lists and users assigned) , the top text labels selected 



Table 1. Top 15 user list communities, arranged in descending order by stability score. 



Score 


Lists Users 


Top Labels 


Top Users 


1.00 


17 


14 


badminton, badminton players, 
baddcrs 


@ Jenny wallwork, @Nath_Robcrtson, 
©ChrisAdcockl 


1.00 


5 


5 


bmx, bmx racing, bmx atleti 


QShanazcRcadc, @liamPHILLIPS65, 
@bloomyl81 


1.00 


32 


11 


sailing, sailors, Olympic 


©SkandiaTcamGBR, @AinslicBcn, 
@matchraccgirls 


1.00 


19 


24 


fencing, fencers, individuclc schcrm- 
crs 


@britishfencing, @CBcnncttGBR, 
@LaurcnccHalstcd 


1.00 


6 


5 


triathlon, machines, swim run 


@AliBrownlcctri, @jodicswallow, 
OMarkCavcndish 


1.00 


5 


21 


scots, red sky, 2014 


@mj881ivc, ORobbieRenwick, 
@Euan_Burton 


1.00 


22 


4 


wiclrcnncn, ciclismo, cycling 


@GcraintThomas86, @UCI_cycling, 
OMarkCavendish 


0.98 


5 


7 


track, field, track field 


©allysonfelix, ©TysonLGay, ©tiffofili 


0.98 


48 


19 


rowing, rowers, gb rowing 


@andrcwthodgc, @ZacPurchasc, 
@MarkHunterGB 


0.97 


14 


22 


diving, tuffi, Olympic diving 


@PctcrWatcrficld, @matthcw_mitcham, 
@toniacouch 


0.96 


36 


44 


hockey, hockey players, held hockey 


@AlcxDansonl5, @RichM6, @jfair25 


0.96 


12 


21 


canoe, canoeing, canoe slalom 


@GBcanocing, @PlanctCanoc, 
@cdmckeever 


0.93 


5 


5 


actors athletes, internet stars, 
athletes tmz 


@usainbolt, @ShawnJohnson, 
QMichaelPhelps 


0.92 


27 


13 


judo, judo clubs, judo related 


@BritishJudo, OUSAJudo, ©IntJudoFcd 


0.87 


6 


7 


runners, hardlopcn, runners world 


@Mo_Farah, Qpaulajradcliffe, 
@KcncnisaBckclc 



for each community, and the three highest-weighted users. We observe that the 
most stable communities generally correspond to communities of users involved 
in specific, "niche" sports (e.g. badminton, BMX racing, fencing). In these cases, 
the top-weighted users correspond to either British Olympic athletes competing 
in these sports, or accounts of the official British organisations for these sports. 
Interestingly, we also see some unexpected communities with high stability - 
a community around the Glasgow 2014 Commonwealth Games, and a commu- 
nity of celebrities which includes "elite" user accounts with hundreds of thou- 
sands of followers (e.g. Qusainbolt, @MichaelPhelps). As stability decreases, we 
observed that communities become less homogeneous, covering highly-popular 
sports (e. g. football, basketball), or containing users and lists related to several 
sports. This suggests that the proposed stability provides a useful measure of 
the homogeneity of topical content for Twitter communities. 

Many of the top labels selected for communities are multi-lingual. For in- 
stance, the label for the "cycling" community in Table 1 contains terms in 
Dutch, Italian, and English. Unlike in certain textual analyses of tweets, the 
use of list membership information allows us to identify groups of users in a 
language-agnostic manner. 

4.3 External Validation 

To validate the consensus user communities that were identified by aggregating 
list information, we use a set of fine-grained Olympics lists also produced by 
The Telegraph 2 , consisting of Twitter users associated with individual sports 
(e. g. "archery", "equestrianism"). This provided us with 18 external "ground 
truth" categories, covering 423 of the 499 users in the dataset. 

We computed precision, recall, and Fl scores for all communities, and sub- 
sequently matched categories to communities based on precision. Table 2 shows 

Table 2. Validation scores achieved relative to 18 "ground truth" categories. 



Category Name 


Category Size 


Precision 


Recall 


Fl 


judo 


20 


1.00 


0.65 


0.79 


basketball 


26 


1.00 


0.50 


0.67 


rowing 


44 


1.00 


0.43 


0.60 


athletics 


50 


1.00 


0.22 


0.36 


cycling 


28 


1.00 


0.14 


0.25 


hockey 


47 


0.98 


0.91 


0.95 


diving 


23 


0.95 


0.91 


0.93 


equestrianism 


18 


0.94 


0.83 


0.88 


fencing 


23 


0.88 


0.91 


0.89 


sailing 


16 


0.82 


0.56 


0.67 


gymnastics 


24 


0.77 


0.42 


0.54 


canoeing 


22 


0.76 


0.73 


0.74 


beach-volleyball 


12 


0.55 


1.00 


0.71 


boxing 


22 


0.55 


0.55 


0.55 


swimming- syncrho 


16 


0.33 


0.13 


0.18 


weightlifting 


6 


0.20 


0.17 


0.18 


archery 


17 


0.20 


0.06 


0.09 


waterpolo 


22 


0.05 


0.05 


0.05 



2 http : //twitter . com/# ! /Telegraph2012/lists 



the resulting scores for all categories, arranged in descending order by precision. 
Communities produced by user list aggregation allowed us to identify eight cat- 
egories with precision > 0.9, while generally maintaining high recall. Only in 
the case of four categories did the proposed approach lead to precision and re- 
call scores of both < 0.5. Subsequent examination of the data suggests that list 
information was relatively sparse for these categories, and that the users were 
generally assigned to more generic lists (e.g. "aquatics" for "waterpolo"). In the 
case of the "cycling" category, which received high precision but low recall, we 
note that many of the lists we collected related to road racing, whereas many of 
the users in the category are associated with Olympic velodrome racing. 

5 Conclusions 

In this paper, we have presented initial work on the idea of identifying topical 
communities on Twitter by aggregating the "wisdom of the crowds" , as encoded 
in the form of user lists. We show that this information can be mined to detect 
and label coherent overlapping clusters of both lists and users. 

While the evaluation in this paper used a fixed network of users, a similar 
approach could be applied to identify topical sub-communities around trending 
terms or hashtags, by compiling a network of users frequently mentioning these 
terms. Also, in some cases, not all users will have been assigned to any user 
lists. We suggest that a classification process, using an alternative network view 
(e.g. follower links) could be used to assign such users to communities. 
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